O’Hara on GitHub


1 Summary

Read in taxonomic traits filled in by taxon experts and cleaned in prior scripts. Combine with coded sensitivity, adaptive capacity, and exposure from stressor-trait sheets to calculate vulnerability.

2 Data

_raw_data/xlsx/master_all_taxa_trait_data.xlsx is the raw workbook prepared by Nathalie Butt from the various submissions of the taxa-group experts. This has been processed and cleaned to _data/spp_traits_valid.csv. See earlier scripts in the process.

trait_stressor_rankings/final_scores_all_stressors_traits.xlsx is a workbook with each sheet indicating sensitivity or adaptive capacity; columns in each sheet indicate stressors, and rows indicate traits.

3 Methods

3.1 Use scored sensitivity traits to stressors against species scored to traits

Set up a function to consistently clean trait values. Trait values in the species trait file are already cleaned and adjusted in many cases to get around mismatches; they are generally lower case, no punctuation except for greater/less than signs.

This function also cleans up category and trait names for consistency. All lower case, punctuation and spaces replaced with underscores. The species trait file is already cleaned in this manner.

clean_traitnames <- function(df, overwrite_clean_col = FALSE) {
  df <- df %>% 
    mutate(category = str_replace_all(category, '[^A-Za-z0-9]+', '_') %>% tolower(),
           category = str_replace_all(category, '^_|_$', ''),
           trait    = str_replace_all(trait, '[^A-Za-z0-9]+', '_') %>% tolower(),
           trait    = str_replace_all(trait, '^_|_$', ''))
  if(!overwrite_clean_col & ('trait_value' %in% names(df))) {
      return(df) ### without overwriting existing trait_value
  }
  if(overwrite_clean_col & ('trait_value' %in% names(df))) {
    x <- readline(prompt = 'Overwriting existing trait_value column? y/n ')
    if(str_detect(x, '^n')) stop('dammit!')
  }
  ### overwrite existing, or add new
  df <- df %>%
    mutate(trait_value = str_replace_all(tolower(trait_value), '[^0-9a-z<>]', ''))
  
  return(df)
}

clean_traitvals <- function(df) {
  x <- df$trait_value
  ### First: remove numeric commas
  y <- str_replace_all(x, '(?<=[0-9]),(?=[0-9])', '') %>%
    ### then: drop all non-alphanumeric and a few key punctuation:
    str_replace_all('[^0-9a-zA-Z<>,;\\-\\.\\(\\)/ ]', '') %>% 
    ### lower case; do it after dropping any weird non-ascii characters:
    tolower() %>% 
    str_trim() %>%
    str_replace_all('n/a', 'na') %>%
    ### convert remaining commas and slashes to semicolons:
    str_replace_all('[,/]', ';') %>%
    ### drop spaces after numbers e.g. 3 mm -> 3mm:
    str_replace_all('(?<=[0-9]) ', '') %>%
    ### drop spaces before or after punctuation (non-alphanumeric):
    str_replace_all(' (?=[^a-z0-9\\(])|(?<=[^a-z0-9\\)]) ', '') %>%
    ### manually fix some valid slashes:
    str_replace_all('nearly sessile;sedentary', 'nearly sessile/sedentary') %>%
    str_replace_all('live birth;egg care', 'live birth/egg care') %>%
    str_replace_all('chitin;caco3mix', 'chitin/caco3 mix') %>%
    str_replace_all('0.5-49mm', '0.5mm-49mm')
    
  df$trait_value <- y
  return(df)
}

assign_rank_scores <- function(x) {
  y <- tolower(as.character(x))
  z <- case_when(!is.na(as.numeric(x)) ~ as.numeric(x),
                 str_detect(y, '^na')  ~ NA_real_,
                 str_detect(y, '^n')   ~ 0.00, ### none, NA, no
                 str_detect(y, '^lo')  ~ 0.33,
                 str_detect(y, '^med') ~ 0.67,
                 str_detect(y, '^hi')  ~ 1.00,
                 str_detect(y, '^y')   ~ 1.00, ### yes
                 TRUE                  ~ NA_real_) ### basically NA
  return(z)
}

Since the species trait file is already cleaned, DO NOT use the clean_traitvals function - it will overwrite the trait_value column.

3.1.1 Check matching

Unmatched traits between sensitivity scoring sheets and species trait sheets:

These traits are in the species-trait scoring sheets but not found in the sensitivity trait scores (should be adaptive capacity/exposure traits only):

number_of_sites, number_of_sites_incl_terrestrial_wetlands, adult_mobility, planktonic_larval_duration_pld_exposure, age_to_1st_reproduction_generation_time, are_there_sub_populations, can_the_sex_ratio_be_altered_by_a_stressor, fecundity, global_population_size, lifetime_reproductive_opportunities, max_age, parental_investment, post_birth_hatching_parental_dependence, reproductive_strategy, depth_min_max, eoo_range, zone, if_one_few_size, sub_population_dependence_on_particular_sites

These traits are in the trait-sensitivity scoring sheet but not found in the species scoring (need to be scored for species):

photosynthetic

3.1.3 Sensitivity to top three stressors by taxon

3.2 Score general adaptive capacity

General adaptive capacity traits are basically related to the overall population’s resilience in the face of a threat. Large extents of occurrence, large population sizes, presence of multiple subpopulations, and reproductive strategies fall into this category.

3.2.1 Check matching

Unmatched traits between general adaptive capacity scoring sheet and species trait sheets:

Traits in species-trait sheets not in general adcap scores: adult_body_mass_body_size, biomineral, calcium_carbonate_structure_location, calcium_carbonate_structure_stages, communication_requirement_sound, extreme_pressure_wave_sensitive_structures, flight, respiration_structures, adult_mobility, planktonic_larval_duration_pld_exposure, dissolved_oxygen, ph, salinity, sensitivity_to_wave_energy_physical_forcing, thermal_sensitivity_to_heat_spikes_heat_waves, thermal_sensitivity_to_ocean_warming_max_temps_tolerated, can_the_sex_ratio_be_altered_by_a_stressor, feeding_larva_post_hatching_metamorphosis, depth_min_max, zone, across_stage_dependent_habitats_condition, air_sea_interface, dependent_interspecific_interactions, extreme_diet_specialization, terrestrial_and_marine_life_stages, within_stage_dependent_habitats_condition, if_one_few_size, navigation_requirements_light, navigation_requirements_sound, navigation_requirements_magnetic

Traits in general adcap scores but not in spp traits:

  • Median: 6.67
  • Mean: 6.4890888
  • Standard Deviation: 2.262208

3.3 Score specific adaptive capacity

Specific adaptive capacity traits are basically related to an organism’s ability to avoid or mitigate exposure, primarily through movement and larval dispersal.

3.3.1 Check matching

Unmatched traits between specific adaptive capacity scoring sheet and species trait sheets:

Traits in species-trait sheets, not in specific adaptive capacity scores:

adult_body_mass_body_size, biomineral, calcium_carbonate_structure_location, calcium_carbonate_structure_stages, communication_requirement_sound, extreme_pressure_wave_sensitive_structures, flight, respiration_structures, number_of_sites, number_of_sites_incl_terrestrial_wetlands, dissolved_oxygen, ph, salinity, sensitivity_to_wave_energy_physical_forcing, thermal_sensitivity_to_heat_spikes_heat_waves, thermal_sensitivity_to_ocean_warming_max_temps_tolerated, age_to_1st_reproduction_generation_time, are_there_sub_populations, can_the_sex_ratio_be_altered_by_a_stressor, fecundity, feeding_larva_post_hatching_metamorphosis, global_population_size, lifetime_reproductive_opportunities, max_age, parental_investment, post_birth_hatching_parental_dependence, reproductive_strategy, eoo_range, across_stage_dependent_habitats_condition, air_sea_interface, dependent_interspecific_interactions, extreme_diet_specialization, terrestrial_and_marine_life_stages, within_stage_dependent_habitats_condition, if_one_few_size, sub_population_dependence_on_particular_sites, navigation_requirements_light, navigation_requirements_sound, navigation_requirements_magnetic

Traits in specific ad cap scores, not in spp-traits:

3.3.2 specific adaptive capacity by stressor and species group

stressor median mean sd
air_temp 1.00 1.095631 0.5109549
biomass_removal 3.00 2.793224 1.1457527
disease_pathogens 3.00 2.814252 1.1765079
entanglement 3.67 3.316589 1.0171061
eutrophication_nutrient_pollution 3.34 3.201110 0.9484584
habitat_loss_degradation 4.00 3.686916 1.2981424
inorganic_pollution 2.67 2.878213 0.9648519
invasive_species 3.00 2.593458 1.1084594
light_pollution 3.00 3.393294 1.1664027
noise_pollution 3.67 3.526565 1.2997845
oa 3.67 3.731787 1.2930291
oceanographic 3.00 2.955911 1.2822757
organic_pollution 2.67 2.878213 0.9648519
plastic_pollution 3.00 3.017979 1.2279242
poisons_toxins 3.00 2.917921 0.9447646
salinity 3.34 3.241133 1.0418816
sedimentation 3.00 2.917535 0.9454804
slr 2.00 1.761682 0.9955358
storm_disturbance 3.67 3.477944 1.2101320
uv 2.00 2.111811 1.3512386
water_temp 3.33 3.033481 1.1064766
wildlife_strike 2.67 2.416752 1.3828451

3.3.3 Adaptive capacity to top three stressors by taxon

3.4 Assign exposure potential modifier

Exposure potential modifier checks whether the depth and oceanic zones of the stressor match with the depth and oceanic zones of the species. These fall into the “spatial scale” category with the exception of EOO.

3.4.1 These species are not listed as potential exposure to these stressors:

Note: this is exposure potential only, based on overlap between species presence and stressor presence - nothing about sensitivity or actual exposure. Check that these logic out.

3.4.2 These species drop out of the exposure potential calculation

Check the spp traits for these species to identify proper assignment of at least one depth zone or ocean zone.

3.5 Determine habitats for habitat loss/degradation stressor

Habitat loss and degradation can be considered as an exposure variable, in the same way as potential exposure above. However, in this case, we consider only one stressor.

Questions to consider that will affect scoring/weighting:

3.5.1 Is there an actionable difference between across-stage and within-stage dependence?

  • Multiple habitats in the “within-stage” category seems to suggest that a species can move among habitats therefore being less sensitive to degradation of one habitat in its range. This is a “parallel habitats” interpretation.
    • However, some species may depend on various habitats in a “series habitat” interpretation, e.g., birds that depend on one habitat type for nesting/breeding, another type for forage, and a third for stopovers in migration. In this case, harm to any would present a bottleneck.
  • Multiple habitats in the “across-stage” category seem to indicate a “series” interpretation - e.g., a fish species whose larvae grow in mangroves, then adults move to reefs.
    • However, stages could survive in multiple habitats (e.g., parallel).
  • Because the trait category is not well defined, we cannot systematically distinguish between series and parallel interpretations for either across- or within-stage dependence.
  • A series interpretation would sum the vulnerabilities; a parallel interpretation would take an average. Which is most conservative? Parallel would communicate the less alarming results, and has the advantage that it also avoids overweighting based on the number of habitats scored.

To score this we will simply lump together all unique listed habitats, regardless of within- or across-stage. This simplifies things so we can simply calculate the habitat degradation vulnerability as normal, and append the habitat name onto the hab loss/degradation stressor name…

4 Combine scores

We will try a calculation for vulnerability \(V\) of species \(i\) to stressor \(j\) that basically looks like this:

\[\text{sensitivity score } S_{i,j} = \mathbf{s}_j^T \mathbf{t}_i\] based on a vector \(\mathbf{s}_j\) of trait-based sensitivity to stressor \(j\), and vector \(\mathbf{t}_i\) of traits of species \(i\);

\[\text{specific adaptive capacity score } K_{i,j} = \mathbf{k}_j^T \mathbf{t}_i\] based on vector \(\mathbf{k}_j\) of trait-based specific adaptive capacity to stressor \(j\); \[\text{general adaptive capacity score } G_{i} = \mathbf{g}^T \mathbf{t}_i\] based on vector \(\mathbf{g}\) of trait-based general adaptive capacity;

\[\text{exposure potential modifier } E_{i,j} = \begin{cases} 1 \text{ when }\mathbf{e}_j^T \mathbf{t}_i > 0\\ 0 \text{ else} \end{cases}\] based on vector \(\mathbf{e}_j\) of trait-based presence of stressor \(j\) (i.e. depth zones and ocean zones in which stressor occurs).

\[\text{vulnerability } V_{i,j} = \frac{S_{i,j} / {S_j}'}{1 + G_i/ {G}' + K_{i,j}/ {K_j}'} \times E_{i,j}\] Each component (\(S_{i,j}, G_i, K_{i,j}\)) is normalized by a reference value (\(S_{j}', G', K_{j}'\) using mean, median, max, etc) for that component for that stressor across all species. Note: median risks referencing to zero for some stressors with few sensitivities (e.g. light pollution); mean risks having a very low reference for the same. Max risks being driven by an outlier, but here the sensitivity scores are generally capped at some low-ish value since there are a finite number of traits that can confer sensitivity. Therefore, we will use max as the reference point. We may wish to consider max possible, which may differ from max observed, in a future iteration?

For species groups with NA in specific adaptive capacity, force to zero (no matching adaptive traits); for species with NA in exposure potential, force to 1 (assume exposure potential).

These results will be saved by species group for now, for future matching to the species level.

spp_vulnerability <- spp_sens %>%
  left_join(spp_adcap_gen, by = c('taxon', 'spp_gp')) %>%
  left_join(spp_adcap_spec, by = c('taxon', 'spp_gp', 'stressor')) %>%
  left_join(spp_exposure, by = c('taxon', 'spp_gp', 'stressor')) %>%
  left_join(spp_dep_habs, by = c('spp_gp', 'stressor')) %>%
  ### fix NAs
  mutate(adcap_spec_score = ifelse(is.na(adcap_spec_score), 0, adcap_spec_score),
         exposure_mod     = ifelse(is.na(exposure_mod), 1, exposure_mod)) %>%
  rename(adcap_spec_raw = adcap_spec_score, adcap_gen_raw = adcap_gen_score, sens_raw = sens_score) %>%
  ### calculate means.  General adcap is overall; specific adcap and sensitivity are by stressor
  group_by(stressor) %>%
  mutate(max_adcap_spec = max(adcap_spec_raw),
         max_sens = max(sens_raw)) %>%
  ungroup() %>%
  mutate(max_adcap_gen = max(adcap_gen_raw)) %>%
  ### rescale components
  mutate(sens_rescale = sens_raw / max_sens,
         adcap_gen_rescale = adcap_gen_raw / max_adcap_gen,
         adcap_spec_rescale = adcap_spec_raw / max_adcap_spec) %>%
  ### note any stressors with zero for max (e.g. adcap for entanglement),
  ### will result in a NaN - convert the rescaled score to zero
  mutate(sens_rescale = ifelse(is.na(sens_rescale), 0, sens_rescale),
         adcap_gen_rescale = ifelse(is.na(adcap_gen_rescale), 0, adcap_gen_rescale),
         adcap_spec_rescale = ifelse(is.na(adcap_spec_rescale), 0, adcap_spec_rescale)) %>%
  ### mash 'em all together
  mutate(vuln = (sens_rescale / (1 + adcap_gen_rescale + adcap_spec_rescale)) * exposure_mod)

spp_vuln_rescale <- spp_vulnerability %>%
  select(-starts_with('max')) %>%
  ungroup() %>%
  mutate(vuln_raw = vuln,
         vuln = vuln_raw / max(vuln_raw))
  
write_csv(spp_vuln_rescale, here('_output/spp_gp_vulnerability.csv'))

4.0.3 Vulnerability to top three stressors by taxon

taxon stressor vuln
cephalopods oceanographic 0.5635096
cephalopods eutrophication_nutrient_pollution 0.5023173
cephalopods inorganic_pollution 0.5017519
corals oceanographic 0.7378219
corals salinity 0.6658616
corals eutrophication_nutrient_pollution 0.5751448
crustacea_arthropods plastic_pollution 0.5313342
crustacea_arthropods light_pollution 0.5013530
crustacea_arthropods biomass_removal 0.4520452
echinoderms oceanographic 0.7288828
echinoderms eutrophication_nutrient_pollution 0.6037158
echinoderms water_temp 0.5861822
elasmobranchs plastic_pollution 0.5852906
elasmobranchs biomass_removal 0.4708993
elasmobranchs oceanographic 0.4436368
fish salinity 0.4475474
fish inorganic_pollution 0.4374225
fish plastic_pollution 0.4320344
marine_mammals biomass_removal 0.7635599
marine_mammals entanglement 0.7375976
marine_mammals wildlife_strike 0.6910002
molluscs plastic_pollution 0.4903678
molluscs eutrophication_nutrient_pollution 0.4866001
molluscs oa 0.4833515
plants_algae biomass_removal 0.4336315
plants_algae entanglement 0.3055551
plants_algae organic_pollution 0.3037102
polychaetes plastic_pollution 0.2229183
polychaetes poisons_toxins 0.1140674
polychaetes sedimentation 0.1140674
reptiles biomass_removal 0.6998888
reptiles invasive_species 0.6376445
reptiles entanglement 0.6142824
seabirds invasive_species 0.5716972
seabirds biomass_removal 0.5613882
seabirds storm_disturbance 0.4947777
sponges light_pollution 0.6350834
sponges organic_pollution 0.3653473
sponges inorganic_pollution 0.2736684

5 TO DO

  • get traits for more species
    • where are the rest of the corals?
    • more fish?
    • extract some values from FishBase and SealifeBase perhaps
    • note these can be gapfilled but more info at spp level = better
  • Idea from Mel: Map the difference between maximum level impact (running the analysis with all pressures at high values) and observed impact.
    • A large difference would indicate regions that have high vulnerability but low impact..which might be good for conservation prioritization.
    • One complication is constraining certain pressures that aren’t global (land-based nutrient pollution will generally not be a problem beyond coastal areas).
    • I have always liked the idea of using this method to identify high risk species based on average global impact (or, similar metric).
    • It might also be useful to map just these species to see where (and which methods of) conservation might be effective if the goal is protecting vulnerable species.